[LTX2] better timing and profiling capabilities#389
Open
Conversation
c2eae2f to
6bd35bf
Compare
Collaborator
|
@mbohlool Could you add a table with the latency gain (single video and amortized throughput) of this change with the baseline (main)? Thanks! |
caaef98 to
6942969
Compare
Collaborator
Author
|
@Perseus14 change the PR to focus only on the timing and profiling part. I explored the performance tweaking later. PTAL. |
Perseus14
reviewed
May 2, 2026
|
🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
|
🤖 I'm sorry @Perseus14, but I was unable to process your request. Please see the logs for more details. |
Perseus14
approved these changes
May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description:
This PR introduces better timing and profiling capabilities to the LTX2 generation pipeline to help identify performance bottlenecks.
Key Changes:
Detailed Timing: Added time.perf_counter() blocks and jax.block_until_ready() calls across the pipeline to accurately measure text encoding, connector passes, denoising steps, VAE decoding, and post-processing.
Multi-Pass Execution: Updated generate_ltx2.py to support a three-stage execution flow:
Warmup Pass: For JIT compilation.
Generation Pass: For actual output and standard timing.
Profiling Pass: (Optional) Captured via max_utils.Profiler for a subset of steps.
Enhanced Logging: Added a summary table for Load, Compile, and Inference times.
e.g.
Config Updates: Added skip_first_n_steps_for_profiler and profiler_steps to the LTX2 configuration.
Memory Management: Explicitly deletes large tensors (out, videos, audios) before the profiling run to prevent OOM.